CSE 255 Assignment 8

نویسندگان

  • Alexander Asplund
  • William Fedus
چکیده

In this paper we train an L1-regularized linear support vector machine (SVM) to determine whether the sentiment of a movie review is positive or negative. We train and test on the movie review polarity dataset introduced by Pang and Lee, 2004 [2]. Classification accuracy of the linear SVM is improved through a series of experiments for various data preprocessing techniques and data transformations. Classification accuracy is found to be maximum on the 10 cross-validation folds after removing numerical entries and performing log odds weighting of terms. Our final linear SVM with per-example regularization cost c = 1.00 generates an 0.877 classification accuracy; this compares favorably to the 0.864 accuracy using subjectivity extracts (Pang and Lee, 2004) and the 0.905 accuracy using linguisitic knowledge sources (Ng et al, 2006 [4]).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CSE 255: Assignment 1 - Exploring Musical Tagging

We explore two predictive tasks: (i) a measure of tag probability, and (ii) identifying a minimum tag set for more meaningful music classification on a 100,000 song dataset joined across complementary databases from the 1 Million Song Dataset (“MSD”). We conclude that a tag set size of around 50 tags is most meaningful and report many of our findings/analysis based on the top 50 tags. Using lin...

متن کامل

CSE 255 Assignment 2 Cuisine Prediction/Classification based on ingredients

In this paper, we consider different strategies for identifying the cuisine, given its ingredients. This project aims to explore what combination of ingredients is helpful in identifying a cuisine if the recipe is not given. This has been tackled as a problem of cuisine classification. We also explore different classification algorithms in tandem with approaches like taking combination of multi...

متن کامل

CSE 255 Assignment 1: Helpfulness in Amazon Reviews

In this paper we consider models for predicting the helpfulness rating of Amazon book reviews. We examine features such as the review’s star rating, the length of the review text, the readability of the review text, and the amount of comparisons made in the review. We compare Support Vector Machine and Random Forests models both for regression and classification.

متن کامل

CSE 255 Assignment 2 : Upvotes Prediction for Reddit Submissions

In this paper we consider models for predicting the number of upvotes on a reddit submission. We examine features such as the number of votes, number of comments, time of submission, upvote history of users, images, and subreddits of the submission. We compare Support Vector Regression, Linear Regression, and Gradient Boosting Regression models for predicting the number of upvotes.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015